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METHOD, SYSTEM AND RECORDING MEDIUM FOR 
MAINTAINING THE ORDER OF NODES IN A HEIRARCHICAL 

DOCUMENT 

* • 

5 • 

BACKGROUND OF THE INVENTION 

Field of the Invention 
An exemplary embodiment of the invention generally relates to the 
1 0 maintenance of order of nodes in a hierarchical document. More particularly, 
an exemplary embodiment of the invention relates to a method and system for 
maintaining the order of nodes in a hierarchical document. 

With the advent of XML as a data representation format, there is an 
increasing need for robust, high performance XML database systems. Most of 
1 5 the recent v^ork focuses on efficient XML query processing while updates 
have received less attention, by comparison. 

SUMMARY OF THE INVENTION 
In order to speed up query, processing, various labeling schemes have 
20 been proposed. However, the vast majority of. these schemes have very bad 
update performance. 

What is needed is an order-preserving labeling scheme having a low 
update cost and a minimum of bits per label. 

In a first exemplary aspect of the present invention, a method of 
25 maintaining the order of nodes in a hierarchical document includes selecting a 
first parameter corresponding to a selected maximum number of children for 
each node for an auxiliary ordered tree, selecting a second parameter 
corresponding to a selected minimum number pf children for each node for the 
auxiliary ordered tree, building the auxiliary ordered tree having at least as 
30 many leaves as atoms within the hierarchical document based upon the first 
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and second parameters, attaching the atoms to the leaves of the auxiliary 
ordered tree, and labeling each of the nodes in the auxiliary ordered tree. 

In a second exemplary aspect of the present invention, a method of 
updating an auxiliary ordered tree having at least as many leaves as atoms 

5 within a hierarchical document based upon a selected maximum number of 
children for each node and a selected minimum number of children for each 
node. The method includes receiving a request to insert the hierarchical 
document with a new atom at specific position, inserting a new leaf in the 
auxiliary ordered tree based on the specific position of the corresponding atom 

10 in the hierarchical document, searching for the highest ancestor node of the 
new leaf that has a number of leaves that equals or exceeds the selected 
maximum number of leaves, if no ancestor is found that equals or exceeds the 
selected maximiuri number of leaves then re-label the sub-tree rooted at the 
parent node of the new leaf; if an ancestor node is found that has a number of 

1 5 leaves that equals or exceeds the selected maximum number of leaves, then 
determining whether the ancestor node is the root node, if the ancestor node is 
the root node, then create a new root having a predetermined number of 
children, if the ancestor node is not the root node, then split the ancestor node 
into complete sub-trees that have the same leaf sequence as the ancestor 

20 node's sub-tree, and reassign labels in a top-down fashion in the sub-tree 
rooted at the parent of the ancestor node. 

In a third exemplary aspect of the present invention, a method of 
optimizing an auxiliary ordered tree having at least as many leaves as atoms 
within a hierarchical document, the shape of the auxiliary ordered tree being 

25 based upon a selected maximum number of children for each node and a 
selected minimum number of children for each node. The method includes 
adjusting the maximum number of children for each node and the selected 
minimum number of children for each node of the auxiliary ordered tree based 
upon application requirements regarding one of update cost, total cost of 

30 queries and updates, and the size of the labels. 
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In a fourth exemplary aspect of the present invention, a method of 
encoding an auxiliary ordered tree having at least as many leaves as atoms 
within a hierarchical document, the shape of the auxiliary ordered tree being 
based upon a selected maximum number of children for each node and a 
selected minimum number of children for each node. The method includes 
minimizing space requirements using a virtual tree. 

In a fifth exemplary aspect of the present invention, a system for 
maintaining the order of nodes in a hierarchical document includes means for 
selecting a first parameter corresponding to a selected maximum number of 
children for each node for an auxiliary ordered tree, means for selecting a 
second parameter corresponding to a selected minimum number of children 
for each node for an auxiliary ordered tree, means for building the auxiliary 
ordered tree having at least as many leaves as atoms within the hierarchical 
document based upon the first and second parameters, means for attaching the 
atoms to the leaves of the auxiliary ordered tree, and means for labeling each 
of the nodes in the auxiliary ordered tree. 

In a sixth exemplary aspect of the present invention, a recording 
medium storing a program for making a computer maintain the order of nodes 
in an hierarchical document. The program includes instructions for selecting a 
first parameter corresponding to a selected maximum number of children for 
each node for an auxiliary ordered tree, instructions for selecting a second 
parameter corresponding to a selected minimum number of children for each 
node for an auxiliary ordered tree, instructions for building the auxiliary 
ordered tree having at least as many leaves as atoms within the hierarchical 
document based upon the first and second parameters, instructions for 
attaching the atoms to the leaves of the auxiliary ordered tree, and instructions 
for labeling each of the nodes in the auxiliary ordered tree. 

In a seventh exemplary aspect of the present invention, a system for 
updating an auxiliary ordered tree having at least as many leaves as atoms 
within a hierarchical document based upon a selected maximum number of 
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children for each node and a selected minimum number of children for each 
node, includes means for receiving a request to insert the hierarchical 
document with a new atom at specific position, means for inserting a new leaf 
in the auxiliary ordered tree based on the specific position of the 
corresponding atom in the hierarchical document, means for searching for the 
highest ancestor node of the new leaf that has a number of leaves that equals 
or exceeds the selected maximum number of leaves, if no ancestor is found 
that equals or exceeds the selected maximum number of leaves then means for 
re-labeling the sub-tree rooted at the parent node of the new leaf, if an ancestor 
node is found that has a number of leaves that equals or exceeds the selected 
maximum number of leaves, then means for determining whether the ancestor 
node is the root node, if the ancestor node is the root node, then means for 
creating a new root having a predetermined number of children, if the ancestor 
node is not the root node, then means for splitting the ancestor node into 
complete sub-trees, that have the same leaf sequence as the ancestor node's 
sub-tree, and means for reassigning labels in a top-down fashion in the sub- 
tree rooted at the parent of the ancestor node. 

In an eighth exemplary aspect of the present invention, a recording 
medium storing a program for making a computer update an auxiliary ordered 
tree having at least, as many leaves as atoms within a hierarchical document 
based upon a selected maximum number of children for each node and a 
selected minimum number of children for each node, includes instructions for 
receiving a request to insert the hierarchical document with a new atom at 
specific position, instructions for inserting a new leaf in the auxiliary ordered 
tree based on the specific position of the corresponding atom in the 
hierarchical document, instructions for searching for the highest ancestor node 
of the new leaf that . has a number of leaves that equals or exceeds the selected 
maximum number of leaves, if no ancestor is found that equals or exceeds the 
selected maximum number of leaves then instructions for re-labeling the sub- 
tree rooted at the parent node of the new leaf, if an ancestor node is found that 
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has a number of leaves that equals or exceeds the selected maximum number 
of leaves, then instructions for determining whether the ancestor node is the 
root node, if the ancestor node is the root node, then instructions for creating a 
new root having a predetermined number of children, if the ancestor node is 

* 

5 not the root node, then instructions for splitting the ancestor node into 
complete sub-trees that have the same leaf sequence as the ancestor node's 
sub-tree, and instructions for reassigning labels in a top-dovra fashion in the 
sub-tree rooted at the parent of the ancestor node. 

An exemplary embodiment of the present invention provides an order- 

1 0 preserving labeling scheme having a low update cost and that minimizes the 
number of bits per label. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The foregoing and other purposes, aspects and advantages will be 
1 5 better imderstood from the following detailed description of exemplary 
embodiments of the invention with reference to the drawings, in which: 

Figure 1 illustrates an exemplary hardware configuration for 
maintaining the order of nodes in a hierarchical document; 

Figure 2 illustrates an exemplary recording medium that stores 
20 instructions for maintaining the order of nodes in a hierarchical document; 

Figure 3 illustrates an exemplary XML document; 
Figure 4 illustrates a conventional tree representation for the XML 
document of Figure 3; 

Figure 5 illustrates an exemplary embodiment of the invention that 
25 maintains the order of nodes in a hierarchical document; 

Figure 6A illustrates an exemplary XML document tree; 
Figure 6B illustrates an exemplary embodiment of an auxiliary ordered 
tree in accordance with the present invention; 

Figure 7 illustrates an exemplary embodiment of the invention that 
30 updates an auxiliary ordered tree in accordance with the present invention; 
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Figure 8 illustrates an.exemplary control routine for maintaining the 
order of nodes in a hierarchical document in accordance with the present 
invention; 

Figure 9A illustrates another exemplary XML document tree; 
Figure 9B illustrates another exemplary embodiment of an auxiliary 
ordered tree in accordance with the present invention; 

Figures 1 OA and 1 OC illustrate yet another exemplary XML document 

tree; 

Figures 1 OB and 1 OD illustrate yet another exemplary embodiment of 
an auxiliary ordered tree in accordance with the present invention; 

Figure 1 1 is a graph that illustrates the amortized cost upper bound 
between experimental results and theoretical results; 

Figure 1 2 is a graph that illustrates the amortized update cost for a 
fixed s and a varying f of experimental results and theoretical results; 

Figure 1 3 is a graph that illustrates the amortized update cost that may 
be achieved by setting the values of s and a fixed f of experimental results and 
theoretical results; and 

Figure 14 is a graph that illustrates optimal cost with bit constraints. 

DETAILED DESCRIPTION OF EXEMPLARY 
EMBODIMENTS OF THE INVENTION 
Referring now to the drawings, and more particularly to Figures 1-14, 
there are shown exemplsiry embodiments of the method and structures 
according to the present invention. 

Figure 1 illustrates a typical hardware configuration of a system for 
maintaining the order of nodes in a hierarchical document 1 00 for use with the 
invention and which preferably has at least one processor or central processing 
unit (CPU) in. 

The CPUs 1 1 1 are interconnected via a system bus 1 1 2 to a random 
YOR920030239US1 
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access memory (RAM) 1 1 4, read-only memory (ROM) 1 1 6, input/output (I/O) 
adapter 118 (for comiecting peripheral devices such as disk units 121 and tape 
drives 1 40 to the bus 1 1 2), user interface adapter 1 22 (for connecting a 
keyboard 124, mouse 126, speaker 128, microphone 132, and/or other user 

5 interface device to the bus 1 12), a communication adapter 134 for connecting 
an information handling system to a data processing network, the Internet, an 
Intranet, a personal area network (PAN), etc., and a display adapter 136 for 
connecting the bus 1 12 to a display device 138 and/or printer 140. 

In addition to the hardware/software environment described above, a 

10 different aspect of the invention includes a computer-implemented method for 
performing the above method. As an example, this method may be 
implemented in the particular environment discussed above. 

Such a method may be implemented, for example, by operating a 
computer, as embodied by a digital data processing apparatus, to execute a 

15 sequence of machine-readable instructions. These instructions may reside in 
various types of signal-bearing media. 

This signal-bearing media may include, for example, a RAM contained 
within the CPU 111, as represented by the fast-access storage for example. 
Alternatively, the instructions may be contained in another signal-bearing 

20 media, such as a magnetic data storage diskette 200 (Figure 2), directly or 
indirectly accessible by the CPU 111.. 

Whether contained in the diskette 200, the computer/CPU 1 1 1, or 
elsewhere, the instructions may be stored on a variety of machine-readable 
data storage media, such as DASD storage (e.g., a conventional "hard drive" 

25 or a RAID array), magnetic tape, electronic read-only memory (e.g., ROM, 
EPRpM, or EEPROM), an optical storage device (e.g. CD-ROM, WORM, 

r * 

DVD, digital optical tape, etc.), paper "punch" cards, or other suitable 
signal-bearing media including transmission media such as digital and analog 
and communication links and wireless. In an illustrative embodiment of the 
30 invention, the machine-readable instructions may comprise software object 
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code, compiled from a language such as "C", etc. 

With the advent of XML as a data representation format, there is an 
increasing need for robust, high performance XML database management 
systems. Historically, XML is the successor of earlier document markup 
5 languages such as SGML and HTML, and as such, XML is primarily a 
document format and, therefore, is fundamentally different from the type of 
relational data that may be encountered in typical business applications. 

Among the most prominent distinctive features of XML is an irregular, 
self-descriptive, potentially recursive structure, and an implicit order among 
1 0 data elements which is the so-called document order. 

Figure 3 shows an example of an XML document. In Figure 3, the 
words enclosed in angle brackets are referred to as "tags." More precisely, the 
"begin tags" are of the form "<A>" and the "end tags" are of the form ''</A>'\ 
The begin and end tags need to be properly nested. 
1 5 In addition to tags there is free text, that can be divided into text 

"segments" (a maximal sequence of consecutive characters not containing any 
tags). For example, as shown in Fig. 3, "ThinkPad T20" ,"John Smith", "This 
laptop is the", "best value", and "in its class" are free text. 

It is common practice to represent the content of an XML document 
20 using a tree diagram. An example of a tree diagram for the XML document of 
Figure 3 is shown in Figure 4. 

The nodes in the tree diagram 400 of the XML document that are 
annotated with tag names are called "element" nodes and the others (annotated 
with text, segments) are called "text" nodes. The numbers associated with each 
25 node will be described later. 

While this detailed description does not include illustrations havmg 
other features defined in the XML standard such as attributes, conmients, 
processing instructions, and namespace declarations. Other features may be 
treated in a similar fashion as elements and text segments. 
30 An XML database should be able to efficiently retrieve ordered XML 
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fragments according to patterns specified in a query language like XPath or 
XQuery. Since XML needs to operate at the item level, an XML database has 
to decompose documents into atomic data items (e.g., elements, attributes, text 
segments, etc.). XML also needs a mechanism for recording the relative 
position of these data items because it needs to preserve document order. 

One conventional method for maintaining document order, which also 
helps in query processing, assigns ordered labels to data items. Thus, if we 
treat an XML document as an ordered tree T, the method traverses T (depth- 
first) and assigns ordered labels to nodes. Each node x receives two labels, the 
first label, Bx, is assigned when node x is first visited, and the second label, 
Ex, is assigned when node x is exited. Therefore, this method assigns two 
different labels to element nodes and two identical labels to text nodes. 

The labels of every node of the tree shown in Figure 4 are the numbers 
in the parenthesis annotated to the node. Since the two labels for text nodes 
are identical, they are shown as a single number for simplicity. 

The advantage of labeling every node in the tree is apparent when 
processing XPath queries using the child axis or the descendant axis. A child 
axis query, only retrieves the immediate children of a node while a descendant 
axis query retrieves all of the direct and indirect descendants of a node. 

An example of a child axis XPath query is 
"/PurchaseOrder/Buyer/Name" which means: "find the name of the buyer 
listed in this purchase order." 

An example of a descendant axis XPath query is "//Lineltems/Zname" 
which meajns: "find all the names of items occurring anywhere inside this 
purchase order." 

An XPath navigation query of a tree that uses this labeling scheme is 
converted to an interval containment query based upon the following 
observation: for every two nodes x and y, x is an ancestor of y if and only if 
the interval (Bx, Ex) includes the interval (By, Ey), or equivalently Bx < By and 

Ey Ex- 
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For example, as shown in Fig. 4, node 402 has the tag "Lineltems" 
with labels (6,30), and there are three nodes 404, 406 and 408 that have the tag 
"Name" and labels (2,4), (8,10) and (23,25), respectively. By examining the 
labels of nodes 406 and 408, it can be detected, that node 402 with tag 
"Lineltems" is an ancestor of nodes 406 and 408 without navigating the XML 
tree: 6<8 and 30>10, 6<23 and 30>25. 

Since checking interval containment of the labels is much more 
efficient than navigating the XML tree to answer an XPath query, this labeling 
scheme can be advantageous for query processing. However, the vast majority 
of the proposed labeling schemes have poor performance when there is an 
update to an XML document. The naive approach of assigning labels from the 
integer domain, in sequential order, leads to re-labeling of half the nodes on 
average, even for a single node insertion to an XML document. 

Altematively, when labeling an XML document tree, gaps can be left 
between successive labels. However, these gaps may be filled as additional 
nodes are inserted into the XML document tree. 

An exemplary embodiment of the present invention assigns and 
dynamically maintains these gaps to ensure optimal use of the tree. 

An exemplary embodiment of the invention provides an order- 
preserving labeling scheme with a 0(log N) amprtized update cost and 0(log 
N.) bits per label (where N is the number of nodes of the XML tree). 

Consider an XML document D viewed in its textual representation as a 
linear ordered list of tags (either begin tags or end tags) and text segments 

(called atoms): LdK^i, a2, aw). As shown in Fig. 5, an exemplary 

embodiment of the present invention 500 includes an XML parser 5 1 0 which 
assigns a numeric label li to each atom aj that reflects the order of the atoms in 
the list. 

In order to compute these labels, this exemplary embodiment of the 
invention 500 includes a label tree builder 520 that builds an auxiliary ordered 
tree, that may be called a label tree, with at least N leaves (where N is the 
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number of atoms) and attaches the atoms to the first N leaves, starting from 
the leftmost leaf. 

This exemplary embodiment of the label tree builder 520 builds the 
tree so that all of the leaves are at the same level (i.e. the number of edges 
5 from the root to any leaf node is the same). 

There are two exemplary parameters, f and s, for the label tree, which 
may be selected and which may determine the shape of the tree. For example, 
the fanout of every node x in the label tree may be bounded by f/s [ fanout(x) 
<f. 

10 For any node x in the label tree, an exemplary embodiment of the 

invention selects values for f and s (as explained later) and assigns a label 
N(x) recursively in a top-down fashion to each document atom in accordance 
with the following labeling algorithm: 



15 . N(root) = 0; (1) 

N(x) = N(y) + i 3 (f - 1 f''\ and (2) 

0<i<f (3) 

20 Where: . 

X is the i* child of y; 

f is the maximum fanout (maximum number of children per node); and 
h(x) is the height of node x (number of edges from node x to a leaf 

node). 

25 An exemplary embodiment of the present invention may also assign 

labels to the XML atoms based upon the labels assigned to their corresponding 
leaves in the label tree. 

This exemplary embodiment of the present invention preserves the 
order of the XML atoms, that is, the following Proposition holds: 

30 
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Proposition 1 : Let x be a leaf in a label tree 
corresponding to an XML atom ai, and y be a leaf 
corresponding to atom aj. Then ai appears before aj in the XML 

« ♦ 

document if and only if N(x) < N(y). 

5 

Initially, an exemplary embodiment of the present invention builds a 
label tree for an existing XML document during a "bulk-loading mode" which 
uses the algorithm described above. To maximize the capability to 
accommodate further insertions, an exemplary embodiment of the present 

1 0 invention builds a label tree based upon a complete f^s-ary tree. 

Fig. 6A shows an XML tree 610 and Fig. 6B shows a corresponding 
label tree 620 (having f = 4, s = 2) built by the bulk-loading method in 
accordance with an exemplary embodiment of the present invention. For 
purposes of illustration, all the nodes in this XML tree 61 0 are element nodes. 

15 A bulk-loading algorithm shown in Fig. 5 builds the label tree of Fig. 

6B in this, exemplary embodiment. The label tree 620 shown in Fig. 6B is a 
complete tree and, therefore, all its leaves are at the same level. An exemplary 
embodiment of the present invention also maintains the label tree created 
during the bulk loading mode. For example, as shown in Fig 7, a label tree 

20 maintenance algorithm 700 in accordance with an exemplary embodiment of 
the present invention may receive an XML Update Request and will provide 
an Updated Label Tree, based on the Current Label Tree. 

. For the purposes of this detailed description, for every internal node x, 
c(x) denotes the number of children of node x and l(x) denotes the number of 

25 leaves in the sub-tree rooted at x. . 

Fig. 8 outlines an exemplary embodiment of a label tree maintenance 
algorithm in accordance with the present invention. The algorithm starts at 
step S800 where the control routine receives a command to insert leaf x after 
leaf y. The control routine then continues to step S8 1 0. 

30 .In order to distribute the labels in a balanced manner, an exemplary 
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embodiment of the invention. limits the maximum number of leaves that each 
internal node t may have in its sub-tree in according to: 



Uax(t) = s3(f/s)'« (4) 



When a leaf node x is inserted in the label tree, l(t) increases by one for 
every ancestor node t of node x, thus, in step S810, shown in Fig. 8, an 
exemplary embodiment of the present invention searches for the highest 
ancestor node t for which the following holds true (step S810): 



L(t) = Uax(t) (5) 



If no such node t exists ("Not found" in step S810), then the control 
routine continues to step S860 and re-labels all of node x's siblings (sub-tree 

15 rooted at parent of node y) according to the labeling algorithm described 
above to assign a label to node x. 

For example, Fig. 9B shows an exemplary embodiment of a label tree 
920 with, a text node "tl" 924 being inserted as the right sibling of a node 
tagged by "C" in XML tree 910 of Fig. 9 A. In the label tree 920 of Fig. 9B, 

20 this insertion is illustrated by a leaf insertion 922 after the leaf labeled "/C." 
The insertion is represented by dotted lines in Figs. 9A and 9B. 

Since nodes at height 1 can accommodate s3(f/s) = f=4 leaves, this 
leaf insertion does not cause its parent node "3" 926 to split. In this case, only 
the siblings of the new leaf 922 may need to be relabeled. 

25 Otherwise, if in step S8 1 0 the control routine determines that such a 

node t does exist ("Found") the control routine continues to step S820. At 
step S820, the control routine determines if node t is the root node. If not, 
then the control routine continues to step S840 where node t is split into s 
nodes and replaces its sub-tree with s complete f/s-ary sub-trees with height 

30 h(t) and with the same leaf sequence as the original sub-tree. The control 
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routine then continues to step S850 where re-labeUng of the sub-tree rooted at 
node t's parent is performed. 

As an example, if an exemplary embodiment of the present invention 
inserts a text node "t2" 1012 as the first child of node "C" 1014 as shown in 

■ 

5 Figs. 1 OA and 1 OC, then that insertion corresponds to a leaf insertion after a 
leaf marked "C" 1014 in the label tree 1010 as shown in Figs. lOB and lOD. 
This insertion may then result in a split of node labeled "3" 1022 of height 1 to 
two complete binary trees, and subsequent re-labeling of all the descendents of 
the splitting node's parent 1024 (node 0). The changes of the labels in the 

1 0 label tree 1 020 shown in Fig. 1 OD are reflected in the labels of XML tree 1 0 1 0 
of Fig. IOC. Note that all the leaves in the label tree 1010 of Fig. lOD are still 
at the same level. 

If, in step S820 the control routine determines that the root node itself 
will need to split, although this happens yery rarely, the control routine 

1 5 continues to step S830 where a new root is created by the control routine. The 
control routine then continues to step S850 where the control routine re-labels 
the whole label tree and the height of the label tree increases by one. 

The intuition behind the. splitting and subsequent re-labeling that 
happens in an exemplary embodiment of the present invention after an 

20 insertion is that if the insertion causes the number of leaves of some sub-tree . 
to increase too much, this means that the labels of these leaves have become 
very dense. 

To remedy the situation of the labels becoming too dense, an 
exemplary embodiment of the present invention splits the sub-tree and re- 
25 labels the nodes of the sub-tree to provide more slack for this portion. In this 
manner, insertions in this portion of the tree may be accommodated. 

Since, in an exemplary embodiment of the present invention, the 
number of leaves of any sub-tree is controlled and the density of the labels is 
also controlled, the number of nodes involved in re-labeling amortized over 
30 several insertions may also be controlled. 

YOR920030239US1 
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While this detailed description focuses on XML insertions, deletions 
can also be handled by marking as "deleted" the corresponding leaves in the 
label tree without any re-labeling. In this manner, an exemplary embodiment 
of the present invention can reuse the labels of the deleted leaves for 

4 

5 subsequent insertions. The insertion into a deleted leaf may be accomplished 
as follows: whenever a node x satisfies the splitting criterion, before an 
exemplary embodiment of the present invention actually splits node x, the 
sub-tree for node x is examined to determine if that sub-tree has any deleted 
leaves. If the sub-tree does have deleted leaves, a new leaf may be inserted in 

1 0 that sub-tree. 

The following describes the properties of a label tree in accordance 
with an exemplary embodiment of the present invention. 
For any intemal node x: 

15 (f / 3)*^^'^^ < 1 (x) < s 3 (f / Bf^'h and (6) 

f/s<.c(x)Sf (7) 

jOne data structure used traditionally in data management systems is a 
20 B-tree. A B-tree is a balanced data structure that may provide for efficient 
searching for keys in a. database management system. By comparison, a label 
tree in accordance with an exemplary embodiment of the present invention is 
similar to a B-tree in that it may guarantee a certain occupancy of the intemal 
nodes such that the tree is. balanced dynamically and the height is bounded by 
25 O(logn), where n is the number of nodes in the tree. However, some of the 
differences are: 

1 . The purpose of a B-tree is to speed up a query. It locates a 
node in a top-down fashion search. In contrast, a purpose of a label tree in 
accordance with an exemplary embodiment of the present invention is to 
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assign a label for a newly inserted node which reflects the relative order of 
nodes in the XML document, and, thus, speeds up regular expression queries; 

2. The splitting criterion of the label tree may be based on the 
number of leaves a node has, rather than the number of children; and 
5 3 : For each internal node x in an exemplary label tree in 

accordance with the present invention: 

f/s<c(x)<f (8) 



10 while for B-trees, s is fixed to 2. 

An exemplary embodiment of the present invention also may avoid 
cascade splitting where spUtting of one node causes another to node split. 

We can infer the meaning of parameters f and s from the splitting 
algorithm explained above, where 
15 f defines the maximum fanout of the label tree, and 

s determines the number of sub-trees created after a split. The 
inventors call this factor the "split factor." 

Both of these parameters contribute to the shape of the tree. An 
exemplary method for assigning values to f and s will be described below. 
20 For purposes of comparison with other labeling schemes, the following 

description analyzes the costs associated with an exemplary embodiment of 
the labeling scheme in accordance with the present invention, in terms of time 
and space requirements. 

Since disk accesses may be several orders of magnitude slower than 
25 CPU computations, the cost as the number of disk accesses are measured, as 
in a traditional database performance analysis. 

Furthermore, it may be assumed that the entire label tree is stored on 
disk and, for simplicity, no assumptions are made about nodes being cached. 
In practice, many higher-level nodes may be cached most of the time, much 
30 like in the case of B-trees, so these estimates are pessimistic. So, the cost is 

■ » 
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measxired as the number of nodes accessed for searching or re-labeling. 

For queries, an exemplary embodiment of the label tree in accordance 
with the present invention does not incur any additional cost. In fact, if the 
labels are stored along with the XML nodes, the label of a given node may be 
retrieved without additional cost. 

The amortized cost for an insertion to an XML tree of size n is: 



2f 

cost(f,s,n) < —4 3 (log n+ 1 ) + f- 1 (9) 
10 log- 



The maximum number of bits, which may be required to encode a 
label is: 



bits(f,s.n) =,^^3(logn+l) (10) 



The above represent the worst-case amortized cost for one insertion 
20 and the number of bits needed as a function of the current number of nodes in 
the XML tree. Since f and s may be constant parameters, the labels of the 
nodes of the XML tree may be maintained with 0(log n) bits and 0(log n) 
amortized insertion cost. 

0(log n) may be the worst case lower bound for update cost. Since the 
25 cost may be measured as the number of disk accesses, even a reduction by a 
constant factor is helpfiiL Similarly, constants may be important for the 
number of bits required. So, the insertion cost can be minimized by choosing 
the optimum values for f and s. 

Given an expected final size n of an XML document, parameters f and 
30 s may be set according to different application needs to optimize the constant 
factors of the cost and bits. 

For example, to minimize cost: 
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obj ect : Min(cost) (11) 



The critical points of the function "cost" may be found by solving the 
following equations: 

^0;and (12) 

Scost^Q (13) 
ds 



For a given n, the above equations are solved to get the value of fo and 

So. 

* 

Evaluating the second derivative at the point (fo,So), we find out if the 

solution is the minimum. 

If the maximum number of bits B which are permitted is constrained, 

the optimization problem becomes: 

object : min(cost) ( 1 4) 

subject to: bits<B (15) 

This is a problem of optimization under inequality constraints. First, 
unconstrained function cost is minimized. If the global minimum satisfies the 
inequality constraints, then (fo,so) is the desired answer. Otherwise the 
minimum must be located "on the line." That is, the local minimum should be 
achieved when the inequality constraints are active. The optimization problem 
under inequality constraints may be converted to an optimization problem 
under equality constraints, as follows: 

object : min(cost) (16) 
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subject to: bits - B = 0 (17) 



The Lagrange multiplier theory may be followed to solve this problem. 
5 A Lagrange multiplier \i is introduced to form a new function: 



G(f,s,n) = cost (f,s,n) + \i3 bits (f,s,n) (1 8) 



The values of f, s and |i which give the conditional minima of cost are 
10 found by solving the following equations: 



^0and5 = 0andbits-^B = 0 (19) 

df OS 



15 If the number of bits, which are used to encode labels, is less than the 

machine word size, the label comparisons for queries are done by the 

hardware and are, therefore, very fast. This is recommended for most cases. 
Occasionally, when n is very large, and the number of bits needed is 

more than the machine word size, the number comparison should be done by 
20 the software. In this case, the time needed for comparison is proportional to 

the number of bits, which are used. So, the overall cost for both queries and 

updates needs to be minimized. 

In this case, the proportion of queries versus updates, say, X and 1 - X 

should be known. The number of label comparisons needed for each query is 
25 proportional to the size of the document that may be denoted by 1 3 n, where t 

is a constant. 

Let Cw be the cost to compare two numbers of machine word size w, 
then the cost to compare two numbers with b bits is Cw 3 Vb/w®. Also, let d 
be the cost of one disk access. Then, the total cost is: 

30 
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TotalCost = X3t3n3cw3 Vbits(f,s,n) / w® + 3 (d + 
Vbits(f,s,n) / w® 3 Cw) 3 cost(f,s,n) (20) 



To minimize the overall cost, we need to solve the following 
equations: 



OTotalCost jSTotalCost 

— ^Oand ^ =0 (21) 



1 0 In general, XML insertions involve a list of atoms (tags and text 

segments) at one time. Although such an insertion can be implemented as a 
sequence of independent leaf insertions in the label tree, the question is 
whether that can be done cheaper by inserting multiple leaves in label tree at 
the same time? The answer is yes. The total cost of inserting p atoms in batch 

15 mode can be shown to be bounded by: 



cost(f,s,p) = (h + f - 1)/ p + 2f /(s --1) 3 (h-hi + 2) (22) 



where hi is the largest number such that: 



(s-l)3(f/s)»^^<p. (23) 



The above exemplary embodiment of present invention provides a 
labeling scheme which uses 0(log n) bits to achieve 0(log n) amortized 
25 update cost and constant query cost. A second exemplary embodiment of the 
present invention improves the update cost with a multiple-level labeling 
scheme. 

For example, the second exemplary embodiment of the present 
invention follows a two-level labeling scheme. The second exemplary 
30 embodiment of the present invention partitions the label tree into 2 parts such 
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that the second part includes all the leaves and the first part includes the rest. 
Each leaf of the first part has log n children, which belong to the second part 
and correspond to the tags of the XML data. The first part is constructed 
according to label tree construction algorithm described above. 
5 The labels of the nodes of height h = 1 jare maintained using the label 

tree algorithm. Within each node x of height h = 1 , the second exemplary 
embodiment of the present invention assigns a label to each of the children of 
node X in a monotonically increasingly process, such that the order of the 
children is maintained within its parents. The label of a leaf the may be called 
10 the second level label. 

Upon a leaf insertion, let x be the parent of the inserted node, if the 
number of leaves of x does not exceed log n, only the second level labels need 
to be updated without any effect on the first level label. 

With 0(log n) bits, a label can be assigned to the newly inserted node 
15 without re-labeling any of its siblings. Otherwise, if x has log n children after 
the insertion, node x may be split into two nodes x and x' node x's children 
are imiformly distributed over these two nodes. Then the second exemplary 
embodiment of the present invention inserts node x' as a sibling of node x 
following the insertion algorithm of the. label tree described above. 
20 Although some re-labeling happens to the first level labels, the second 

level labels remain unchanged except those of node x's children. So, the 
amortized cost of maintaining the first level labels is still 0(log n). Notice that 
since this cost is chjarged to node x's (log n)/ 2 newly inserted children, each 
leaf has an amortized update cost of 0(1). 
25 In order to compute the relative order of two leaves, the label tree is 

traversed upward to retrieve the label of the parent of each leaf and to 
compose the complete label as a concatenation of a first level label with a 
second level label and the complete labels, are compared. 

Although this second exemplary embodiment of the present invention 
30 achieves better update performance, the query performance may decrease. If 
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the single-level labeling scheme is used, the labels may be stored together with 

« 

the data, such that the label may be retrieved for free during queries. If the 

two-level labeling scheme is used, only the second level label may be stored 

with the XML data. To compare the order of two tags, their parent's labels 
5 (i.e. the first level label) will need to be retrieved by two disk accesses unless 

the label tree can be kept in memory. 

Therefore, there is a tradeoff of the multi-level labeling scheme, since 

re-numberings can be done locally, the update cost decreases. However, since 

the complete label does not propagate to the leaves the label tree needs to be 
10 traversed upward for each label comparison which slows down the query 

processing. So, unless the label tree can be kept in memory, multiple level 

labeling may slow down the query. 

As an altemative to storing the label tree on disk, only the leaf labels 

(with the XML nodes) may be stored because all the structural information of 
15 the label tree is implicit in the labels themselves. Indeed, any leaf label is of 

the form: 

N(x) = io + ii(f-l)^ + i2 (f-1)' +...+ ih.i(f-l)'"* (24) 

20 Where: 

10 is the relative position of node x in its siblings list, and 

11 is x's parent's position among its siblings. 

In other words, the base (f-1) digits of N(x) provide the Dewey number 
of leaf node x. Based on this observation, the label tree incremental 
25 maintenance algorithm may be run. without the label tree. 

For example, in order to check if an internal node y satisfies the 
splitting criterion, it suffices to count how many leaf labels are in the range 
[N(y), N(y) + (f-l)*"!^^]. If the leaf labels are maintained in a B-tree whose 
internal nodes also maintain counts, such range queries may be executed 
30 efficiently (in logarithmic time). . 
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Furthermore, once a splitting (virtual) node has been identified, the 
leaf labels corresponding to the s complete fs-ary (virtual) trees can be 
computed easily and updated in place, on the labels identified by the range 
query. 

5 The inventors conducted a series of experiments that verify the 

theoretical analysis. The experiments simulate a sequence of XML node 
insertions at random positions. The update cost is measured as the number of 
nodes that are relabeled or accessed during an insertion. Fig. 1 1 shows that 
for fixed values of the parameters f and s, the amortized update cost is 

10 logarithmic in the number of XML nodes in the document, as predicted by the 
theoretical worst case upper bound. 

Figs. 12 and 13 show how the amortized cost changes as the value of f 
and s are changed, while keeping the other parameter and the size of the 
document unchanged. 

1 5 Fig. 14 shows for a given number of bits, what is the minimal update 

cost that may be achieved by setting the values of f and s. This is useful to get 
the optimal update cost constrained by the number of bits being permitted. At 
first sight, it may seem strange because after a certain point, the cost starts to 
increase with the number of bits used. Although this resuh contradicts 

20 intuition, it's reasonable with the adopted model. To compute the worst-case 
upper bound with this model, assume that inserting a node causes its parent to 
be relabeled. 

Since the range of numbers used to denote a node's children is thought 
as log(f) and no mechanism (say, label tree) is used to manipulate the labels of 
25 the children, the cost of re-labeling one!s children is bounded by f But if the 
number of bits being permitted is large enough, f bits may be used for the 
range of labels for one node's children, vvhich eliminates the need to re-label a 
node if a new child is inserted. 
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The cost and the number of bits for this model is: 



cost = 




iog(fys) 



3 (log n +1) 



(25) 



bits = 



f 



3(logn+ 1) 



(26) 



log(f/s) 



This model also provides an 0(log n) update cost with 0(log n) bits, 
although the constant of cost is reduced if more bits are needed. If the number 
of bits being permitted is 0(n), then set f = n and achieve 0(1) update cost. 

The labeling scheme of the present invention maintains the order of 
data items within an XML document. An exemplary embodiment of the 
present invention uses an auxiliary data structure, called label tree, which 
helps assign and update labels to data items. Additional exemplary 
embodiments of the present invention provide algorithms both for the bulk 
loading and for the incremental maintenance of the label tree. 

The present invention automatically adapts to uneven insertion rates in 
different areas of the XML document. For example, in areas with heavy 
insertion activity, the label tree adjusts itself by creating more slack between 
labels, to better acconmiodate future insertions. 

Yet another exemplary embodiment of the present invention, 
distributes the re-labeling work required by a node split evenly for a number 
of insert operations, so as to eliminate any performance degradations. 

While the invention has been, described in terms of several exemplary 
embodiments, those skilled in the art will recognize that the invention can be 
practiced with modification. 

Further, it is noted that, Applicants' intent is to encompass equivalents 
of all claim elements, even if amended later during prosecution. 
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CLAIMS 

What is claimed is: 

1 . A method of maintaining the order of nodes in a hierarchical 
document, comprising: 

selecting a first parameter corresponding to a selected maximum 
number of children for each node for an auxiliary ordered tree; 

selecting a second parameter corresponding to a selected minimum 
number of children for each node for an auxiliary ordered tree; 

building the auxiliary ordered tree having at least as many leaves as 
atoms within said hierarchical document based upon the first and second 
parameters; 

attaching the atoms to the leaves of said auxiliary ordered tree; and 
labeling each of the nodes in the auxiliary ordered tree. 

2. The method of claim .1 , wherein the labeling of the nodes in the 
auxiliary tree is defined by: 

N(root) = 0; 
. N(x) = N(y) + i 3 (f - 1)^^'^^; and 

0<i<f 

Where: 

N(x) is the label for node x; 
X IS the i"^ child of y;. 

f is the maximum number of children per node; and 
h(x) is the height of node x. 

3. The method of claim 1, fiirther comprising assigning labels to the 
atoms in the hierarchical document based upon the labels assigned to the 
corresponding leaves in the auxiliary ordered tree. 

4. The method of claim 1, further comprising storing the labels of the 
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leaves of the auxiliary ordered tree. 

t 

5. The method of claim 4, further comprising storing the remaining 
portion of the auxiliary ordered tree. 

5 

# • - 

6. The method of claim 1 , further comprising partitioning the auxiliary 
ordered tree into a first portion that comprises the leaves from the auxiliary 
ordered tree and a second portion that comprises the remaining portion of the 
auxiliary ordered tree. 

10 

7. The method of claim 1, further comprising re-assigning labels to the 
atoms in the hierarchical document based upon the labels assigned to the 
corresponding leaves in the updated auxiliary ordered tree. 

15 8. A method of updating an auxiliary ordered tree having at least as many 
leaves as atoms within a hierarchical document based upon a selected 
maximum number of children for each node and a selected minimum number 

■ 

of children for each node, comprising: 

receiving a request to insert the hierarchical document with a new 
20 atom at specific position; 

inserting a new leaf in the auxiliary ordered tree based on the specific 
position of the corresponding atom in the hierarchical document; 

searching for the highest ancestor node of the new leaf that has a 
number of leaves that equals or exceeds the selected maximum number of 
25 leaves; 

if no ancestor is found that equals or exceeds the selected maximum 
number of leaves then re-labeling the sub-tree rooted at the parent node of the 
new leaf; 

if an ancestor node is found that has a number of leaves that equals or 
30 exceeds the selected maximum number of leaves, then 
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determining whether the ancestor node is the root node; 

if the ancestor node is the root node, then creating a new root 
having a predetermined number of children; 

if the ancestor node is not the root node, then spUtting the 
ancestor node into complete sub-trees that have the same leaf sequence 
as the ancestor node's sub-tree; and 

reassigning labels in a top-down fashion in the sub-tree rooted 
at the parent of the ancestor node. 

9. The method of claim 8, wherein the predetermined maximum number 
of leaves is defined as: 

L^(t) = s3(f/s)'(^> 

Where: 

f is a predetermined maximum fanout; and 
s is a predetermined split factor. 

1 0. The method of claim 8, wherein said insertion request comprises a 
request to insert a plurality of consecutive atoms and wherein said updating 
minimizes the cost of inserting the new leaves that correspond to the plurality 
of consecutive atoms. 

1 1 . The method of claim 1 0, wherein the plurality of consecutive atoms 
comprise a plurality of tags and text segments. 

12. The method of claim 8, further comprising: 

receiving a request to delete an atom in the hierarchical document at a 

specific position; and 

marking the corresponding leaf in the auxiliary ordered tree as deleted. 

1 3 . The method of claim 1 2, further comprising: 
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determining whether the sub-tree of an ancestor node that equals or 
exceeds a predetenhined maximum number of leaves has a sub-tree with a 
deleted leaf; and 

inserting a new leaf in place of the deleted leaf. 

* 

5 

14. A method of optimizing an auxiliary ordered tree having at least as 

4 

many leaves as atoms within a hierarchical document, the shape of the 
auxiliary ordered tree being based upon a selected maximum number of 
children for each node and a selected minimum number of children for each 
1 0 node, the method comprising adjusting the maximxmi number of children for 
each node and the selected minimum number of children for each node of the 
auxiliary ordered tree based upon application requirements regarding one of 
update cost, total cost of queries and updates, and the size of the labels. 

15 15. A method of encoding an auxiliary ordered tree having at least as many 
leaves as atoms within a hierarchical docviment, the shape of the auxiliary 
ordered tree being based upon a selected maximum number of children for 
each node and a selected minimum number of children for each node, the 
method comprising minimizing space requirements using a virtual tree. 

20 

1 6. A system for maintaining the order of nodes in a hierarchical 
document, comprising: 

means for selecting a first parameter corresponding to a selected 
maximum number of children for each node for an auxiliary ordered tree; 
25 means for selecting a second parameter corresponding to a selected 

minimum number of children for each node for an auxiliary ordered tree; 

means for building the auxiliary ordered tree having at least as many 

* 

leaves as atoms within said hierarchical document based upon the first and 
second parameters; 

30 means for attaching the atoms to the leaves of said auxiliary ordered 
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tree; and 

means for labeling each of the nodes in the auxiliary ordered tree. 

17. The system of claim 16, wherein the means for labeling each of the 
5 nodes bases the labeling upon: 

N(root) = 0; 
N(x) = N(y) + i3(f-lf''^;and 

0<i<f 

Where: 

1 0 N(x) is the label for node x; 

X is the i* child of y; 

f is the maximum number of children per node; and . 
h(x) is the height of node x. 

15 18. The system of claim 1 6, further comprising means for storing the 
labels of the leaves of the auxiliary ordered tree. 

1 9. The system of claim 1 8, further comprising means for storing the 
remaining portion of the auxiliary ordered tree. 

20 

20. The system of claim 1 6, further comprising means for partitioning the 
auxiliary ordered tree into a first portion that comprises the leaves from the 
auxiliary ordered tree and a second portion that comprises the remaining 
portion of the auxiliary ordered tree. 

25 

2 1 . The system of claim 1 6, further comprising means for re-assigning 
labels to the atoms in the hierarchical document based upon the labels 
assigned to the corresponding leaves in the updated auxiliary ordered tree. 

30 22. A recording medium storing a program for making a computer 
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maintain the order of nodes in an hierarchical document, the program 
comprising: 

instructions for selecting a first parameter corresponding to a selected 
maximum number of children for each node for an auxiliary ordered tree; 
5 instructions for selecting a second parameter corresponding to a 

selected minimum number of children for each node for an auxiliary ordered 
tree; 

instructions for building the auxiliary ordered tree having at least as 
many leaves as atoms within said hierarchical document based upon the first 
10 and second parameters; 

instructions for attaching the atoms to the leaves of said auxiliary 
ordered tree; and 

instructions for labeling each of the nodes in the auxiliary ordered tree. 



15 23. The medium of claim 22, wherein the instructions for labeling each of 
the nodes is based upon: 

N(root) - 0; 
N(x)=N(y) + i3(f-lf'^;and 

0<i<f 

20 Where: 

N(x) is the label for node x; 
x IS the i"^ child ofy; 

f is the maximum number of children per node; and 
h(x) is the height of node x. 



25 



24. The medium of claim 22, further comprising instructions for assigning 
labels to the atoms in the hierarchical document based upon the labels 
assigned to the corresponding leaves in the auxiliary ordered tree. 
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25 . The medium of claim 22, further comprising instructions for storing 
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the labels of the leaves of the auxiliary ordered tree. 

26. . The medium of claim 25, further comprising instructions for storing 
the remaining portion of the auxiliary ordered tree. 

27. The medium of claim 22, further comprising instructions for 
partitioning the auxiliary ordered tree into a first portion that comprises the 
leaves from the auxiliary ordered tree and a second portion that comprises the 
remaining portion of the auxiliary ordered tree. 

28. The medium of claim 22, further comprising instructions for re- 
assigning labels to the atoms in the hierarchical document based upon the 
labels assigned to the corresponding leaves in the updated auxiliary ordered 
tree. 

29. A system for updating an auxiliary ordered tree having at least as many 
leaves as atoms within a hierarchical document based upon a selected 
maximum number of children for each node and a selected minimum number 
of children for each node, comprising: 

means for receiving a request to insert the hierarchical document with 
a new atom at specific position; 

means for inserting a new leaf in the auxiliary ordered tree based on 
the specific position of the corresponding atom in the hierarchical document; 

means for searching for the highest ancestor node of the new leaf that 
has a number of leaves that equals or exceeds the selected maximum number 
of leaves; 

if no ancestor is found that equals or exceeds the selected maximum 
number of leaves then means for re-labelling .the sub-tree rooted at the parent 

node of the new leaf; 

if an ancestor node is found that has a number of leaves that equals or 
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exceeds the selected maximum number of leaves, then 

means for determining whether the ancestor node is the root 

node; 

if the ancestor node is the root node, then means for creating a 
5 new root having a predetermined number of children; 

if the ancestor node is not the root node, then means for 
splitting the ancestor node into complete sub-trees that have the same 
leaf sequence as the ancestor node's sub-tree; and 

means for reassigning labels in a top-down fashion in the sub- 
10 tree rooted at the parent of the ancestor node. 

30. A recording medium storing a program for making a computer update 
an auxiliary ordered tree having at least as many leaves as atoms within a 
hierarchical docxmient based upon a selected maximum number of children for 
1 5 each node and a selected minimum number of children for each node, 
comprising: 

instructions for receiving a request to insert the hierarchical document 
with a new atom at specific position; 

instructions for inserting a new leaf in the auxiliary ordered tree based 
20 on the specific position of the corresponding atom in the hierarchical 
document; 

instructions for searching for the highest ancestor node of the new leaf 
that has a number of leaves that equals or exceeds the selected maximum 
number of leaves; 

25 if no ancestor is found that equals or exceeds the selected maximum 

number of leaves then instructions for re-labeling the sub-tree rooted at the 
parent node of the new leaf; 

if an ancestor node is found that has a number of leaves that equals or 
exceeds the selected maximum number of leaves, then 

30 instructions for determining whether the ancestor node is the 
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root node; 

if the ancestor node is the root node, then instructions for 
creating a new root having a predetermined number of children; 

if the ancestor node is not the root node, then instructions for 
5 splitting the ancestor node into complete sub-trees that have the same 

leaf sequence as the ancestor node's sub-tree; and 

instructions for reassigning labels in a top-dovra fashion in the 
sub-tree rooted at the parent of the ancestor node. 
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■ 34 . 

METHOD, SYSTEM AND RECORDING MEDIUM FOR 
MAINTAINING THE ORDER OF NODES IN A HEIRARCHICAL 

DOCUMENT 

5 

ABSTRACT OF THE DISCLOSURE 



A method, a system and recording medium for maintaining the order 
of nodes in a hierarchical docimient. The method may select the maximum 
1 0 and the minimum number of children for each node, build an auxiliary ordered 
tree having at least as many leaves as atoms within the hierarchical document 
based upon the selected maximum and minimum number of children for each 
node, attach the atoms to the leaves of the auxiliary ordered tree, and label 
each of the nodes in the auxiliary ordered tree. 
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